Buffered Bloom Filters on Solid State Storage

نویسندگان

  • Mustafa Canim
  • George A. Mihaila
  • Bishwaranjan Bhattacharjee
  • Christian A. Lang
  • Kenneth A. Ross
چکیده

Bloom Filters are widely used in many applications including database management systems. With a certain allowable error rate, this data structure provides an efficient solution for membership queries. The error rate is inversely proportional to the size of the Bloom filter. Currently, Bloom filters are stored in main memory because the low locality of operations makes them impractical on secondary storage. In multi-user database management systems, where there is a high contention for the shared memory heap, the limited memory available for allocating a Bloom filter may cause a high rate of false positives. In this paper we are proposing a technique to reduce the memory requirement for Bloom filters with the help of solid state storage devices (SSD). By using a limited memory space for buffering the read/write requests, we can afford a larger SSD space for the actual Bloom filter bit vector. In our experiments we show that with significantly less memory requirement and fewer hash functions the proposed technique reduces the false positive rate effectively. In addition, the proposed data structure runs faster than the traditional Bloom filters by grouping the inserted records with respect to their locality on the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Don't Thrash: How to Cache Your Hash on Flash

This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclusively in RAM, limiting the size of the sets it can repres...

متن کامل

BF-Tree: Approximate Tree Indexing

The increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads like social and service monitoring often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of...

متن کامل

Weak Bloom Filtering for Large-Scale Dynamic Networks

In order to forward decisions in networks where protocols play an important role we need to provide the information about the destination nodes provided by routing table states. When dynamic networks are considered the path to reach the destination may change, and corresponding states become invalid and need to be refreshed . In large, complex and highly dynamic networks, this is quite cumberso...

متن کامل

Implementation and Evaluation of Improved Secure Index Scheme Using Standard and Counting Bloom Filters

This paper presents an improved Secure Index scheme as a searchable symmetric encryption technique and provides a solution that enables a secure and efficient data storage and retrieval system. Secure Index scheme, conceived by Goh, is based on standard Bloom filters (SBFs). Knowledge of the limitations of SBFs, such as handling insertions but not deletions, helps in understanding the advantage...

متن کامل

Scalable Bloom Filters

Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumpti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010